Exploratory Data Analysis on UKRAINE RUSSIA War (2022)¶

The EDA on the war will use two datasets.
Important: Each new record is accumulated data from previous days.
Important. Data will be updated daily

We will talk about the following:

  1. Personnel
  2. Prisoner of War
  3. Armored Personnel Carrier
  4. Multiple Rocket Launcher
  5. Aircraft
  6. Anti-aircraft warfare
  7. Drone
  8. Field Artillery
  9. Fuel Tank
  10. Helicopter
  11. Military Auto
  12. Naval Ship
  13. Tank

Source of data:source

Acronyms

  1. POW - Prisoner of War
  2. MRL - Multiple Rocket Launcher
  3. APC - Armored Personnel Carrier
  4. SRBM - Short-range ballistic missile
  5. drones: UAV - Unmanned Aerial Vehicle; RPA - Remotely Piloted Vehicle.

Import the libraries

In [1]:
import json, requests
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

Data Preparation and Cleaning¶

In [2]:
response1 = requests.get('https://raw.githubusercontent.com/PetroIvaniuk/2022-Ukraine-Russia-War-Dataset/main/data/russia_losses_equipment.json')
In [3]:
response1.status_code
Out[3]:
200
In [4]:
dict1 = response1.json()

Dataframe for losses on equipments

In [5]:
df1 = pd.DataFrame(dict1)
In [6]:
df1.head()
Out[6]:
date day aircraft helicopter tank APC field artillery MRL military auto fuel tank drone naval ship anti-aircraft warfare special equipment mobile SRBM system
0 2022-02-25 2 10 7 80 516 49 4 100 60 0 2 0 NaN NaN
1 2022-02-26 3 27 26 146 706 49 4 130 60 2 2 0 NaN NaN
2 2022-02-27 4 27 26 150 706 50 4 130 60 2 2 0 NaN NaN
3 2022-02-28 5 29 29 150 816 74 21 291 60 3 2 5 NaN NaN
4 2022-03-01 6 29 29 198 846 77 24 305 60 3 2 7 NaN NaN
In [7]:
response2 = requests.get('https://raw.githubusercontent.com/PetroIvaniuk/2022-Ukraine-Russia-War-Dataset/main/data/russia_losses_personnel.json')
In [8]:
response2.status_code
Out[8]:
200
In [9]:
dict2 = response2.json()

Dataframe for losses on personnel and POW

In [10]:
df2 = pd.DataFrame(dict2)
In [11]:
df2.head()
Out[11]:
date day personnel personnel* POW
0 2022-02-25 2 2800 about 0
1 2022-02-26 3 4300 about 0
2 2022-02-27 4 4500 about 0
3 2022-02-28 5 5300 about 0
4 2022-03-01 6 5710 about 200

Merge both the dataframes

In [12]:
raw_df = pd.merge(df1,df2)
raw_df.head()
Out[12]:
date day aircraft helicopter tank APC field artillery MRL military auto fuel tank drone naval ship anti-aircraft warfare special equipment mobile SRBM system personnel personnel* POW
0 2022-02-25 2 10 7 80 516 49 4 100 60 0 2 0 NaN NaN 2800 about 0
1 2022-02-26 3 27 26 146 706 49 4 130 60 2 2 0 NaN NaN 4300 about 0
2 2022-02-27 4 27 26 150 706 50 4 130 60 2 2 0 NaN NaN 4500 about 0
3 2022-02-28 5 29 29 150 816 74 21 291 60 3 2 5 NaN NaN 5300 about 0
4 2022-03-01 6 29 29 198 846 77 24 305 60 3 2 7 NaN NaN 5710 about 200
In [13]:
#Drop the column personnel*

raw_df.drop(['personnel*'],axis = 1,inplace = True)
In [14]:
#Capitalize the first letter of every word for column names

raw_df.columns = [i.title() for i in raw_df.columns]
In [15]:
raw_df.columns
Out[15]:
Index(['Date', 'Day', 'Aircraft', 'Helicopter', 'Tank', 'Apc',
       'Field Artillery', 'Mrl', 'Military Auto', 'Fuel Tank', 'Drone',
       'Naval Ship', 'Anti-Aircraft Warfare', 'Special Equipment',
       'Mobile Srbm System', 'Personnel', 'Pow'],
      dtype='object')
In [16]:
#Rename the columns

raw_df = raw_df.rename(columns = {'Field Artillery':'Field_Artillery','Military Auto':'Military_Auto','Fuel Tank':'Fuel_Tank',
                    'Naval Ship':'Naval_Ship','Anti-Aircraft Warfare':'Anti_Aircraft_Warfare','Special Equipment':'Special_Equipment',
                   'Mobile Srbm System':'Mobile_Srbm_System','Apc':'APC','Mrl':'MRL' })
In [17]:
#Calculate the total of equipments lost per day

raw_df['Equipment_Per_Day'] = raw_df[raw_df.columns[2:15]].sum(axis = 1)
In [18]:
raw_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 43 entries, 0 to 42
Data columns (total 18 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Date                   43 non-null     object 
 1   Day                    43 non-null     object 
 2   Aircraft               43 non-null     int64  
 3   Helicopter             43 non-null     int64  
 4   Tank                   43 non-null     int64  
 5   APC                    43 non-null     int64  
 6   Field_Artillery        43 non-null     int64  
 7   MRL                    43 non-null     int64  
 8   Military_Auto          43 non-null     int64  
 9   Fuel_Tank              43 non-null     int64  
 10  Drone                  43 non-null     int64  
 11  Naval_Ship             43 non-null     int64  
 12  Anti_Aircraft_Warfare  43 non-null     int64  
 13  Special_Equipment      25 non-null     float64
 14  Mobile_Srbm_System     16 non-null     float64
 15  Personnel              43 non-null     int64  
 16  Pow                    43 non-null     int64  
 17  Equipment_Per_Day      43 non-null     float64
dtypes: float64(3), int64(13), object(2)
memory usage: 6.4+ KB
In [19]:
#Convert date to datetime

raw_df['Date'] = pd.to_datetime(raw_df['Date'])
raw_df['Day'] = raw_df['Day'].astype(int)
In [20]:
#Sort the null values in descending order

raw_df.isna().sum().sort_values(ascending = False)
Out[20]:
Mobile_Srbm_System       27
Special_Equipment        18
Date                      0
Day                       0
Pow                       0
Personnel                 0
Anti_Aircraft_Warfare     0
Naval_Ship                0
Drone                     0
Fuel_Tank                 0
Military_Auto             0
MRL                       0
Field_Artillery           0
APC                       0
Tank                      0
Helicopter                0
Aircraft                  0
Equipment_Per_Day         0
dtype: int64
In [21]:
raw_df['Special_Equipment'].unique()
Out[21]:
array([nan, 10., 11., 12., 13., 15., 16., 18., 19., 21., 22., 24., 25.])
In [22]:
raw_df['Mobile_Srbm_System'].unique()
Out[22]:
array([nan,  2.,  4.])
In [23]:
#Calculate the percentage of null values in our columns


missing_percentages = raw_df.isna().sum().sort_values(ascending = False)/len(raw_df)*100
missing_percentages
Out[23]:
Mobile_Srbm_System       62.790698
Special_Equipment        41.860465
Date                      0.000000
Day                       0.000000
Pow                       0.000000
Personnel                 0.000000
Anti_Aircraft_Warfare     0.000000
Naval_Ship                0.000000
Drone                     0.000000
Fuel_Tank                 0.000000
Military_Auto             0.000000
MRL                       0.000000
Field_Artillery           0.000000
APC                       0.000000
Tank                      0.000000
Helicopter                0.000000
Aircraft                  0.000000
Equipment_Per_Day         0.000000
dtype: float64
In [24]:
# Plot a horizontal bar graph of only null values



percentage_plot = missing_percentages[missing_percentages != 0 ].plot.bar(rot = 0,figsize = (10,10),color = 'lightblue')
plt.ylabel('Missing Values',size = 13)
plt.title(' Missing % of null values for equipments', size = 15)
plt.grid(axis = 'y',linestyle = '--',alpha = 0.8)
plt.bar_label(percentage_plot.containers[0],size = 10)
Out[24]:
[Text(0, 0, '62.7907'), Text(0, 0, '41.8605')]
In [25]:
#Check for null rows

null_columns = raw_df[['Mobile_Srbm_System','Special_Equipment']][raw_df.isnull().any(axis = 1)]



We choose to replace null values with zero as data has not been provided for "NaN" values.

In [26]:
raw_df.fillna(0,inplace = True)
In [27]:
raw_df.isnull().any()
Out[27]:
Date                     False
Day                      False
Aircraft                 False
Helicopter               False
Tank                     False
APC                      False
Field_Artillery          False
MRL                      False
Military_Auto            False
Fuel_Tank                False
Drone                    False
Naval_Ship               False
Anti_Aircraft_Warfare    False
Special_Equipment        False
Mobile_Srbm_System       False
Personnel                False
Pow                      False
Equipment_Per_Day        False
dtype: bool
In [28]:
#We have no duplicated values

raw_df.duplicated().sum()
Out[28]:
0
In [29]:
#Total losses of equipment

col_list = list(raw_df.columns[2:15])

Total = []

for i in raw_df.columns[2:15]:
    Total.append(raw_df[i].sum())


Equipment = pd.DataFrame({'Equipment': col_list,'Total': Total})
Equipment
Out[29]:
Equipment Total
0 Aircraft 3927.0
1 Helicopter 4201.0
2 Tank 19575.0
3 APC 60021.0
4 Field_Artillery 9149.0
5 MRL 3090.0
6 Military_Auto 36258.0
7 Fuel_Tank 2866.0
8 Drone 1756.0
9 Naval_Ship 188.0
10 Anti_Aircraft_Warfare 1619.0
11 Special_Equipment 493.0
12 Mobile_Srbm_System 62.0
In [30]:
#Total losses in personnel from our raw dataset

print(f'Total personnel losses: {raw_df.Personnel.sum()}')
Total personnel losses: 589150

Note: The dataset conatins a combination of equipments lost for both entities.

Before carrying out any analysis let us cross check our data with real estimates of Russian firepower in comparison with columns in our dataset.

Personnel / Manpower:
Total Military Personnel(est.) : 1,350,000
Total Military Personnel can be broken down into three categories:

  1. Active Personnel: 850,000
  2. Reserve Personnel: 250,000
  3. Paramilitary: 250,000

Airpower:
Total Aircraft strength:4,173

  1. Interceptors (Combat Aircraft): 772
  2. Helicopters: 1,543
  3. Dedicated Attack (Total Attack aircraft): 739
  4. Transports (Fixed wing transport): 445
  5. Trainers (Military trainer aircraft): 522
  6. Special Mission (Specil mission aircraft): 132
  7. Tanker fleet (Tanker fleet aircraft): 20
  8. Attack Helicopters (Military attack helicopters): 544

Land Forces:

  1. Tanks : 12,420
  2. Armored Vehicles: 30,122
  3. Self-propelled artillery: 6,574
  4. Towed Artillery: 7,571
  5. Rocket Projectors: 3,391

Naval Forces:

  1. Total Assets: 605
  2. Aircraft Carriers: 1
  3. Destroyers: 15
  4. Frigates: 11
  5. Corvettes: 86
  6. Submarines: 70
  7. Patrol Vessels: 59
  8. Mine Warfare: 49 source

NOTE: COMPARE THIS DATA WITH THE TOTAL LOSSES RECORDED IN OUR DATASET

Our losses of helicopters in is far more than the inventory of helicopters in Russia.
The Ukrainian Air Force has: 69 fighters 29 dedicated attack aircraft 32 transports 71 trainers 5 aircraft designated as special mission 112 helicopters source
Combined inventory of helicopters is still less than the losses we have recorded.
The same can be said about the personnel lost.

This is because every row in our dataframe returns the total number of a particular equipment and personnel loss recorded up to that date

SO WHAT TO DO NOW????¶

IN ORDER TO PROCEED our first step will be to create a revised dataframe that will contain the equipment and personnel lossesfor every 24hrs.

In [31]:
#Assign the list of dates and days to two different variables
date = list(raw_df['Date'])
day = list(raw_df['Day'])
In [32]:
# diff() will be applied to columns in order to analyze the looses every 24hrs


df = raw_df[['Aircraft', 'Helicopter', 'Tank', 'APC',
             'Field_Artillery', 'MRL', 'Military_Auto', 'Fuel_Tank', 'Drone',
             'Naval_Ship', 'Anti_Aircraft_Warfare', 'Special_Equipment',
             'Mobile_Srbm_System', 'Personnel', 'Pow', 'Equipment_Per_Day']].diff()
In [33]:
df['Date'] = date
df['Day'] = day
In [34]:
# Create a dataframe that will contain the values for every 24hrs


df = df.reindex(columns = ['Date', 'Day', 'Aircraft', 'Helicopter', 'Tank', 'APC',
                           'Field_Artillery', 'MRL', 'Military_Auto', 'Fuel_Tank', 'Drone',
                           'Naval_Ship', 'Anti_Aircraft_Warfare', 'Special_Equipment',
                           'Mobile_Srbm_System', 'Personnel', 'Pow', 'Equipment_Per_Day'])
df.head()
Out[34]:
Date Day Aircraft Helicopter Tank APC Field_Artillery MRL Military_Auto Fuel_Tank Drone Naval_Ship Anti_Aircraft_Warfare Special_Equipment Mobile_Srbm_System Personnel Pow Equipment_Per_Day
0 2022-02-25 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 2022-02-26 3 17.0 19.0 66.0 190.0 0.0 0.0 30.0 0.0 2.0 0.0 0.0 0.0 0.0 1500.0 0.0 324.0
2 2022-02-27 4 0.0 0.0 4.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 200.0 0.0 5.0
3 2022-02-28 5 2.0 3.0 0.0 110.0 24.0 17.0 161.0 0.0 1.0 0.0 5.0 0.0 0.0 800.0 0.0 323.0
4 2022-03-01 6 0.0 0.0 48.0 30.0 3.0 3.0 14.0 0.0 0.0 0.0 2.0 0.0 0.0 410.0 200.0 100.0
In [35]:
#Save the values in an object
val = raw_df.iloc[0:1,2:18]


#Use the object to replace NaN
df.iloc[0:1,2:18] = val
In [36]:
df.describe()
Out[36]:
Day Aircraft Helicopter Tank APC Field_Artillery MRL Military_Auto Fuel_Tank Drone Naval_Ship Anti_Aircraft_Warfare Special_Equipment Mobile_Srbm_System Personnel Pow Equipment_Per_Day
count 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000 43.000000
mean 24.418605 3.534884 3.186047 16.790698 44.441860 7.953488 2.511628 32.186047 1.767442 2.604651 0.162791 1.279070 0.581395 0.093023 448.837209 10.860465 117.093023
std 13.227375 3.984188 4.811968 16.834712 82.239575 10.823153 3.948309 40.733311 9.226869 3.506046 0.484534 1.816925 1.607017 0.426165 675.207937 32.499510 142.962002
min 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 5.000000
25% 13.500000 1.000000 0.000000 7.500000 15.500000 2.000000 0.000000 13.500000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 100.000000 0.000000 60.500000
50% 25.000000 2.000000 2.000000 13.000000 23.000000 5.000000 1.000000 22.000000 0.000000 2.000000 0.000000 1.000000 0.000000 0.000000 200.000000 0.000000 88.000000
75% 35.500000 4.500000 3.000000 16.500000 37.500000 10.000000 3.500000 34.500000 0.000000 3.000000 0.000000 2.000000 1.000000 0.000000 350.000000 9.000000 101.500000
max 46.000000 17.000000 20.000000 80.000000 516.000000 51.000000 17.000000 224.000000 60.000000 15.000000 2.000000 8.000000 10.000000 2.000000 3160.000000 200.000000 828.000000

Check for outliers¶

In [37]:
def boxplot_plot(x):
    
    for i in df.columns[2:]:
        fig = plt.figure(figsize = (8,8))
        x.boxplot(column = [i])
        plt.grid(False)
        plt.title(i+ ' outliers',fontstyle = 'italic')
        plt.grid(axis = 'y',linestyle = '--',alpha = 0.5)
        plt.tight_layout()
        plt.show()
In [38]:
boxplot_plot(df)

Create a distplot for equipments, personnel and POW

In [39]:
def distplot_plot(x):
    
        for i in df.columns[2:]:
            fig = plt.figure(figsize = (15,5))
    
            sns.distplot(x[i])
    
            mean = x[i].mean()
    
            median = x[i].median()
    
            Q1 =  x[i].quantile(0.25) #25th percentile
    
            Q3 =  x[i].quantile(0.75) #75th percentile
    

    
            plt.axvline(mean,color = 'red',label = 'Mean')
            plt.axvline(median,color = 'green',label = 'Median')
            plt.axvline(Q1,color = 'yellow',label = '25th_ercentile')
            plt.axvline(Q3,color = 'blue',label = '75th_percentile')
            plt.legend()
            plt.title('Distribution of ' + i,fontstyle = 'italic',fontsize = 12) 
            plt.grid(axis = 'y',linestyle = '--',alpha = 0.5)
            plt.show()
        
distplot_plot(df)



We will avoid treating the outliers as any manipultaion of outliers will lead to a significant loss of data for different dates

In our case we need numerical values to accurately analyze the trend for different types of equipments, personnel and POW

Exploratory Analysis and Visualization¶

In [40]:
#Total losses in personnel till the last date of our dataset

print(f'Total personnel losses after calculating the diff(): {df.Personnel.sum()}')
Total personnel losses after calculating the diff(): 19300.0
In [41]:
#Total losses in equipment till the last date of our dataset

col_list = list(df.columns[2:15])

Total = []

for i in df.columns[2:15]:
    Total.append(df[i].sum())


Equipment_diff= pd.DataFrame({'Equipment_diff': col_list,'Total': Total})
Equipment_diff
Out[41]:
Equipment_diff Total
0 Aircraft 152.0
1 Helicopter 137.0
2 Tank 722.0
3 APC 1911.0
4 Field_Artillery 342.0
5 MRL 108.0
6 Military_Auto 1384.0
7 Fuel_Tank 76.0
8 Drone 112.0
9 Naval_Ship 7.0
10 Anti_Aircraft_Warfare 55.0
11 Special_Equipment 25.0
12 Mobile_Srbm_System 4.0

On which date the equipments lost was the highest and the lowest?¶

In [42]:
#Create a line plot

fig = px.line(df,x = 'Date', y = 'Equipment_Per_Day',title = 'Trend in equipments lost')
fig.add_hline(y = df['Equipment_Per_Day'].max(),line_dash="dash", line_color="red",annotation_text="Max. val",opacity = 0.4)
fig.add_hline(y = df['Equipment_Per_Day'].min(),line_dash="dash", line_color="green",annotation_text="Min. val",opacity = 0.4)


#Update trace and layout
fig.update_traces(mode="markers+lines")
fig.update_layout(hovermode="x unified")
fig.show()


# Calculate the date for which the losses in equipments was the highest and the lowest
Max_val = df[['Date','Equipment_Per_Day']][df['Equipment_Per_Day'] == df['Equipment_Per_Day'].max()]
Min_val =  df[['Date','Equipment_Per_Day']][df['Equipment_Per_Day'] == df['Equipment_Per_Day'].min()]


display(Max_val,Min_val)
Date Equipment_Per_Day
0 2022-02-25 828.0
Date Equipment_Per_Day
2 2022-02-27 5.0



So far the maximum amount of losses have been recorded on 2022-02-25 and the minimum on 2022-02-27. This will be subjected changes that will be dependent on data update.

Which equipment had the maximum and minimum amount of losses?¶

Equipment with least losses

In [43]:
least_used_equipment = Equipment_diff[Equipment_diff['Total'] == Equipment_diff['Total'].min()]
least_used_equipment
Out[43]:
Equipment_diff Total
12 Mobile_Srbm_System 4.0



Equipment with min. losses: Mobile Short Range Ballistic Missile (SRBM) System

Equipment with most losses

In [44]:
most_used_equipment = Equipment_diff[Equipment_diff['Total'] == Equipment_diff['Total'].max()]
most_used_equipment
Out[44]:
Equipment_diff Total
3 APC 1911.0



Equipment with max. losses : APC

In [45]:
fig = px.bar(Equipment,x = 'Equipment',y = 'Total',log_y = True,title = 'Total of each equipment used',height = 700,
            hover_data = ['Equipment','Total'],color = 'Equipment')
fig.show()

Plot for all equipments lost over time¶

In [46]:
fig = px.bar(df,x = 'Date',y = ['Aircraft', 'Helicopter', 'Tank', 'APC',
                                'Field_Artillery', 'MRL', 'Military_Auto', 'Fuel_Tank', 'Drone',
                                'Naval_Ship', 'Anti_Aircraft_Warfare', 'Special_Equipment',
                                'Mobile_Srbm_System'],log_y = True,
            height= 1000)

fig.update_layout(title = 'Equipments vs Dates',
                 xaxis_title = 'Date',
                 yaxis_title = 'Frequency of Equipments',
                 hovermode="x unified")

fig.show()

Trend for each equipment lost¶


We create a function that comprises of three different operations:

  1. The line plot is to identify the trend for every type of loss for every single day of the ongoing war.
  2. The bar plot is to display the percentage drop or rise of losses from their previous day.
  3. We also return a dataframe that show us the amount of equipment lost and their respective percentage rise or drop in comparison to the previous day of the war.
In [47]:
def analysis(eq):
    
    #Line plot
    fig = px.line(df,x = 'Date',y = eq ,height = 450)
    
    fig.update_layout(title = eq + ' % loss of over time',
                     xaxis_title = 'Date',
                     yaxis_title = 'Frequency',
                     hovermode="x unified")

    
    fig.add_hline(y=df[eq].max(), line_dash="dot",
              annotation_text="Max. Value", 
              annotation_position="bottom right",
              line_color="green")
                  
                  
    fig.add_hline(y=df[eq].min(), line_dash="dot",
              annotation_text='Min. Value', 
              annotation_position="bottom right",
              line_color="red") 
    
    fig.show()

    print(f' Max.loss: {df[eq].max()}')
    print(f' Min.loss: {df[eq].min()}')
        
    #Create the dataframe
    date = df.Date
    losses = df[eq]


    a = losses.diff()
    df_new = pd.DataFrame({'Date': date,'Loss': losses,'Loss_percentage':a})
    
    df_new['Loss_percentage'] = (df_new['Loss_percentage']/df_new['Loss'])*100
    df_new['Loss_percentage'].replace(np.nan,0,inplace = True)
    df_new['Loss_percentage'].replace(-np.inf,-100,inplace = True) 
    
    #Bar plot
    fig = px.bar(df_new,x = 'Date',y = 'Loss_percentage',color = 'Loss',height = 600,title = '% losses over time')
    fig.update_layout(hovermode="x unified")
    fig.show()
    
    #Return the dataframe
    return df_new 

AIRCRAFTS

In [48]:
Aircraft = analysis('Aircraft')
 Max.loss: 17.0
 Min.loss: 0.0
In [49]:
Aircraft.head()
Out[49]:
Date Loss Loss_percentage
0 2022-02-25 10.0 0.000000
1 2022-02-26 17.0 41.176471
2 2022-02-27 0.0 -100.000000
3 2022-02-28 2.0 100.000000
4 2022-03-01 0.0 -100.000000

HELICOPTER

In [50]:
Helicopter = analysis('Helicopter')
 Max.loss: 20.0
 Min.loss: 0.0
In [51]:
Helicopter.head()
Out[51]:
Date Loss Loss_percentage
0 2022-02-25 7.0 0.000000
1 2022-02-26 19.0 63.157895
2 2022-02-27 0.0 -100.000000
3 2022-02-28 3.0 100.000000
4 2022-03-01 0.0 -100.000000

TANK

In [52]:
Tank = analysis('Tank')
 Max.loss: 80.0
 Min.loss: 0.0
In [53]:
Tank.head()
Out[53]:
Date Loss Loss_percentage
0 2022-02-25 80.0 0.000000
1 2022-02-26 66.0 -21.212121
2 2022-02-27 4.0 -1550.000000
3 2022-02-28 0.0 -100.000000
4 2022-03-01 48.0 100.000000

APC

In [54]:
APC = analysis('APC')
 Max.loss: 516.0
 Min.loss: 0.0
In [55]:
APC.head()
Out[55]:
Date Loss Loss_percentage
0 2022-02-25 516.0 0.000000
1 2022-02-26 190.0 -171.578947
2 2022-02-27 0.0 -100.000000
3 2022-02-28 110.0 100.000000
4 2022-03-01 30.0 -266.666667

FIELD ARTILLERY

In [56]:
Field_Artillery= analysis('Field_Artillery')
 Max.loss: 51.0
 Min.loss: 0.0
In [57]:
Field_Artillery.head()
Out[57]:
Date Loss Loss_percentage
0 2022-02-25 49.0 0.000000
1 2022-02-26 0.0 -100.000000
2 2022-02-27 1.0 100.000000
3 2022-02-28 24.0 95.833333
4 2022-03-01 3.0 -700.000000

MRL

In [58]:
MRL = analysis('MRL')
 Max.loss: 17.0
 Min.loss: 0.0
In [59]:
MRL.head()
Out[59]:
Date Loss Loss_percentage
0 2022-02-25 4.0 0.000000
1 2022-02-26 0.0 -100.000000
2 2022-02-27 0.0 0.000000
3 2022-02-28 17.0 100.000000
4 2022-03-01 3.0 -466.666667

MILITARY AUTO

In [60]:
Military_Auto = analysis('Military_Auto')
 Max.loss: 224.0
 Min.loss: 0.0
In [61]:
Military_Auto.head()
Out[61]:
Date Loss Loss_percentage
0 2022-02-25 100.0 0.000000
1 2022-02-26 30.0 -233.333333
2 2022-02-27 0.0 -100.000000
3 2022-02-28 161.0 100.000000
4 2022-03-01 14.0 -1050.000000

FUEL TANK

In [62]:
Fuel_Tank = analysis('Fuel_Tank')
 Max.loss: 60.0
 Min.loss: 0.0
In [63]:
Fuel_Tank.head()
Out[63]:
Date Loss Loss_percentage
0 2022-02-25 60.0 0.0
1 2022-02-26 0.0 -100.0
2 2022-02-27 0.0 0.0
3 2022-02-28 0.0 0.0
4 2022-03-01 0.0 0.0

DRONES

In [64]:
Drones = analysis('Drone')
 Max.loss: 15.0
 Min.loss: 0.0
In [65]:
Drones.head()
Out[65]:
Date Loss Loss_percentage
0 2022-02-25 0.0 0.0
1 2022-02-26 2.0 100.0
2 2022-02-27 0.0 -100.0
3 2022-02-28 1.0 100.0
4 2022-03-01 0.0 -100.0

NAVAL SHIP

In [66]:
Naval_Ship = analysis('Naval_Ship')
 Max.loss: 2.0
 Min.loss: 0.0
In [67]:
Naval_Ship.head()
Out[67]:
Date Loss Loss_percentage
0 2022-02-25 2.0 0.0
1 2022-02-26 0.0 -100.0
2 2022-02-27 0.0 0.0
3 2022-02-28 0.0 0.0
4 2022-03-01 0.0 0.0

ANTI AIRCRAFT WARFARE

In [68]:
Anti_Aircraft_Warfare = analysis('Anti_Aircraft_Warfare')
 Max.loss: 8.0
 Min.loss: 0.0
In [69]:
Anti_Aircraft_Warfare.head()
Out[69]:
Date Loss Loss_percentage
0 2022-02-25 0.0 0.0
1 2022-02-26 0.0 0.0
2 2022-02-27 0.0 0.0
3 2022-02-28 5.0 100.0
4 2022-03-01 2.0 -150.0

SPECIAL EQUIPMENT

In [70]:
Special_Equipment = analysis('Special_Equipment')
 Max.loss: 10.0
 Min.loss: 0.0
In [71]:
Special_Equipment.head()
Out[71]:
Date Loss Loss_percentage
0 2022-02-25 0.0 0.0
1 2022-02-26 0.0 0.0
2 2022-02-27 0.0 0.0
3 2022-02-28 0.0 0.0
4 2022-03-01 0.0 0.0

MOBILE SRBM SYSTEM

In [72]:
Mobile_SRBM_System = analysis('Mobile_Srbm_System')
 Max.loss: 2.0
 Min.loss: 0.0
In [73]:
Mobile_SRBM_System.head()
Out[73]:
Date Loss Loss_percentage
0 2022-02-25 0.0 0.0
1 2022-02-26 0.0 0.0
2 2022-02-27 0.0 0.0
3 2022-02-28 0.0 0.0
4 2022-03-01 0.0 0.0

PERSONNEL

In [74]:
Personnel = analysis('Personnel')
 Max.loss: 3160.0
 Min.loss: 0.0

PRISONER OF WAR

In [75]:
POW = analysis('Pow')
 Max.loss: 200.0
 Min.loss: 0.0

Total Personnel Losses and POW¶

In [76]:
Personnel_pow_plt = df[['Personnel','Pow']].sum().plot.bar(rot = 0,figsize = (10,10),cmap = 'plasma')
plt.ylabel('Sum',fontsize = 13)
plt.title('Personnel and POW loss',fontsize = 15)
plt.grid(axis = 'y',linestyle = '--',alpha = 0.8)
plt.bar_label(Personnel_pow_plt.containers[0],size = 10)
plt.show()

Aerial eqiupments during war¶

Create a database for the aerial equipments lost.

In [77]:
Date = df.Date
Aircraft =  df.Aircraft
Helicopter = df.Helicopter
Drone = df.Drone
Anti_Aircraft_Warfare = df.Anti_Aircraft_Warfare
Mobile_Srbm_System = df.Mobile_Srbm_System

df_aerial = pd.DataFrame({'Date': Date,'Aircaft':Aircraft,'Helicopter':Helicopter,'Drone':Drone,
                          'Anti_Aircraft_Warfare':Anti_Aircraft_Warfare,'Mobile_Srbm_System':Mobile_Srbm_System})
df_aerial['Total_Losses_Aerial'] = df_aerial[df_aerial.columns[1:6]].sum(axis = 1) 
df_aerial.head()
Out[77]:
Date Aircaft Helicopter Drone Anti_Aircraft_Warfare Mobile_Srbm_System Total_Losses_Aerial
0 2022-02-25 10.0 7.0 0.0 0.0 0.0 17.0
1 2022-02-26 17.0 19.0 2.0 0.0 0.0 38.0
2 2022-02-27 0.0 0.0 0.0 0.0 0.0 0.0
3 2022-02-28 2.0 3.0 1.0 5.0 0.0 11.0
4 2022-03-01 0.0 0.0 0.0 2.0 0.0 2.0



We will create a new dataframe from our dataframe of losses for aerial equipments.
This dataframe will help us see the contribution of different aerial equipments (percentage) towards the total of aerial equipments lost

In [78]:
col_list = list(df_aerial.columns[1:6])

Percentage = []

for i in df_aerial.columns[1:6]:
    Percentage.append(round((df_aerial[i].sum()/df_aerial['Total_Losses_Aerial'].sum())*100,1))

df_aerial_percentage = pd.DataFrame({'Aerial Equipment': col_list, '%_of_sum_loss_aerial':Percentage})
df_aerial_percentage
Out[78]:
Aerial Equipment %_of_sum_loss_aerial
0 Aircaft 33.0
1 Helicopter 29.8
2 Drone 24.3
3 Anti_Aircraft_Warfare 12.0
4 Mobile_Srbm_System 0.9
In [79]:
fig = px.pie(df_aerial_percentage, values='%_of_sum_loss_aerial', 
             names='Aerial Equipment', color_discrete_sequence=px.colors.sequential.Blues)
fig.show()

Ground equipments during war¶

Create a database for the ground equipments lost.

In [80]:
Date = df.Date
Tank = df.Tank
APC = df.APC
Field_Artillery = df.Field_Artillery
MRL = df.MRL
Military_Auto = df.Military_Auto
Fuel_Tank = df.Fuel_Tank

df_ground = pd.DataFrame({'Date':Date,'Tank': Tank,'APC': APC,'Field_Artillery':Field_Artillery,
                          'MRL':MRL,'Military_Auto':Military_Auto,'Fuel_Tank':Fuel_Tank})
df_ground['Total_Losses_Ground'] =df_ground[df_ground.columns[1:7]].sum(axis = 1)
df_ground.head()
Out[80]:
Date Tank APC Field_Artillery MRL Military_Auto Fuel_Tank Total_Losses_Ground
0 2022-02-25 80.0 516.0 49.0 4.0 100.0 60.0 809.0
1 2022-02-26 66.0 190.0 0.0 0.0 30.0 0.0 286.0
2 2022-02-27 4.0 0.0 1.0 0.0 0.0 0.0 5.0
3 2022-02-28 0.0 110.0 24.0 17.0 161.0 0.0 312.0
4 2022-03-01 48.0 30.0 3.0 3.0 14.0 0.0 98.0



We will create a new dataframe from our dataframe of losses for ground equipments.
This dataframe will help us see the contribution of different ground equipments (percentage) towards the total of ground equipments lost

In [81]:
col_list = list(df_ground.columns[1:7])

Percentage = []

for i in df_ground.columns[1:7]:
    Percentage.append(round((df_ground[i].sum()/df_ground['Total_Losses_Ground'].sum())*100,1))

df_ground_percentage = pd.DataFrame({'Ground Equipment': col_list, '%_of_sum_loss_ground':Percentage})
df_ground_percentage
Out[81]:
Ground Equipment %_of_sum_loss_ground
0 Tank 15.9
1 APC 42.1
2 Field_Artillery 7.5
3 MRL 2.4
4 Military_Auto 30.5
5 Fuel_Tank 1.7
In [82]:
fig = px.pie(df_ground_percentage, values='%_of_sum_loss_ground', 
             names='Ground Equipment', color_discrete_sequence=px.colors.sequential.RdBu)
fig.show()

Naval equipments during war¶

Create a database for the naval equipments lost.

In [83]:
Date = df.Date
Naval = df.Naval_Ship

df_naval = pd.DataFrame({'Date': Date,'Naval': Naval})
df_naval.head()
Out[83]:
Date Naval
0 2022-02-25 2.0
1 2022-02-26 0.0
2 2022-02-27 0.0
3 2022-02-28 0.0
4 2022-03-01 0.0
In [84]:
#Plot a line plot to analyze ground, aerial and naval equipments lost

fig = px.line(x = df.Date, y = [df_aerial.Total_Losses_Aerial,df_ground.Total_Losses_Ground,df_naval.Naval])
names={'wide_variable_0':'Aerial Equipments','wide_variable_1':'Ground Equipments ','wide_variable_2':'Naval Equipments'}
fig.for_each_trace(lambda x: x.update(name = names[x.name]))
fig.update_layout(hovermode="x unified",yaxis_title = 'Frequency',title = 'Aerial Equipments vs Ground Equipments vs Naval Equipments')
fig.show()

The war signals towards a heavy loss of ground equipments. Whereas the use of naval equipments is almost null. Let us calculate the total for each category.

In [85]:
df_eq = pd.DataFrame()
df_eq['Total_Naval_Losses'] = df_naval.Naval
df_eq['Total_Ground_losses'] = df_ground.Total_Losses_Ground
df_eq['Total_Aerial_Losses'] = df_aerial.Total_Losses_Aerial
eq_plt = df_eq[['Total_Naval_Losses','Total_Ground_losses','Total_Aerial_Losses']].sum().plot.bar(rot = 0,figsize = (10,10),cmap = 'summer')
plt.ylabel('Sum',fontsize = 13)
plt.title('Sum for categories of equipment',fontsize = 15)
plt.grid(axis = 'y',linestyle = '--',alpha = 0.8)
plt.bar_label(eq_plt.containers[0],size = 10)
plt.semilogy()
plt.show()

We can see that the amount of losses of ground equipment is far more than the losses in other categories till now.